A Preliminary Study on Probabilistic Models for Chinese Abbreviations

نویسندگان

  • Jing-Shin Chang
  • Yu-Tso Lai
چکیده

Chinese abbreviations are widely used in the modern Chinese texts. They are a special form of unknown words, including many named entities. This results in difficulty for correct Chinese processing. In this study, the Chinese abbreviation problem is regarded as an error recovery problem in which the suspect root words are the “errors” to be recovered from a set of candidates. Such a problem is mapped to an HMM-based generation model for both abbreviation identification and root word recovery, and is integrated as part of a unified word segmentation model when the input extends to a complete sentence. Two major experiments are conducted to test the abbreviation models. In the first experiment, an attempt is made to guess the abbreviations of the root words. An accuracy rate of 72% is observed. In contrast, a second experiment is conducted to guess the root words from abbreviations. Some submodels could achieve as high as 51% accuracy with the simple HMM-based model. Some quantitative observations against heuristic abbreviation knowledge about Chinese are also observed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining atomic Chinese abbreviations with a probabilistic single character recovery model

An HMM-based single character recovery (SCR) model is proposed in this paper to extract a large set of atomic abbreviations and their full forms from a text corpus. By an ‘‘atomic abbreviation,’’ it refers to an abbreviated word consisting of a single Chinese character. This task is important since Chinese abbreviations cannot be enumerated exhaustively but the abbreviation process for compound...

متن کامل

Using Probabilistic-Risky Programming Models in Identifying Optimized Pattern of Cultivation under Risk Conditions (Case Study: Shoshtar Region)

Using Telser and Kataoka models of probabilistic-risky mathematical programming, the present research is to determine the optimized pattern of cultivating the agricultural products of Shoshtar region under risky conditions. In order to consider the risk in the mentioned models, time period of agricultural years 1996-1997 till 2004-2005 was taken into account. Results from Telser and Kataoka mod...

متن کامل

Implicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?

In English as a Second Language Teaching and Testing situations, it is common to infer about learners’ reading ability based on his or her total score on a reading test. This assumes the unidimensional and reproducible nature of reading items. However, few researches have been conducted to probe the issue through psychometric analyses. In the present study, the IELTS exemplar module C (1994) wa...

متن کامل

ارائه یک مدل احتمالاتی برای توزیع خوردگی یکنواخت در سکوهای ثابت فلزی در خلیج فارس

For structural reliability assessment or risk analysis of aging offshore steel structures, it is essential to have a probabilistic model, which contains specific statistical parameters, and predicts long term corrosion loss as a function of time. The aim of this study is to propose such model for offshore jacket platforms in the Persian Gulf. Field measurements for material loss due to uniform ...

متن کامل

Mining Atomic Chinese Abbreviation Pairs with a Probabilistic Single Character Word Recovery Model

An HMM-based Single Character Recovery (SCR) Model is proposed in this paper to extract a large set of “atomic abbreviation pairs”from a text corpus. By an “atomic abbreviation pair,”it refers to an abbreviated word and its root word (i.e., unabbreviated form) in which the abbreviation is a single Chinese character. This task is important since Chinese abbreviations cannot be enumerated exhaust...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004